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ABSTRACT 

A simple scheme is proposed for smoothly 
approximating the ability distribution for relatively long tests, 
assuming that the item characteristic curves (iCCs) are known or well 
estimated. The scheme works for a general class of ICCs and is 
guaranteed to completely recover the theta distribution as the test 
length increases. The proposed method of estimating the ability 
distribution is robust to some violations of local independence. 
After an initial function inversion, the scheme can be inexpensively 
used to recover the theta distribution in each of several different 
administrations of the sa^ie test or several subpopulations in one 
test administration. Moreover, this apj^oach could be used to recover 
the distribution of a dominant ability dimension when local 
independence fails • The scheme provides a starting place for 
diagnostics concerning assumptions about the shape of the theta 
distribution or ICCs of a particular test. Work is currently under 
way to further examine and refine these methods using essentially 
unidimensional simulation data and to apply the estimator to real 
tests. Kernel smoothing is also considered. A 16~item list of 
references, 10 tables, 8 graphs, and 2 appendixes that provide 
details of the simulation and proofs are included. (RLC) 
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Abstract 



We propose a simple scheme for smoothly approximating the ability distribu- 
tion for relatively long tests, ft*™*"^^ that the ICC*s ore known or well estimated,. 
The scheme works for quite a general class of item characteristic corvee (ICCs) 
and is guaranteed to completely recover the 6 distribution, as the test length, 
J, grows. After an initial function inversion, the scheme can be inexpensiveijr 
used to recover the 0 distribution in each of several different administrations 
of the same test (or subpopui&iioas in one test administration). Moreover, this 
approach could be used to recover the distribution of a ifanrn m,tt ability dimen- 
sion when local independence fails. Finally, the scheme provides a starting place 
for diagnostics concerning assumptions about the shape of the 0 distribution or 
ICCs of a particular test. Work is currently underway to further examine and, 
refine these methods using essentially ii™*frT*»w««w»l simulation data, and to 
apply the estimators to real testa. 
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1 The basic estimator 

A principal application of educational testing is inferring the distribution of abilities in 
various populations. This task is important for both users of these tests (in, say, comparing 
various subpopulations) and researchers and test developers (in, say, developing or using 
item calibration — ICC parameter estimation—procedures within the IKT framework). 

Inference about the ability distribution from item response data goes back at least to 
Lord (1953) who gives an interesting qualitative account of the possible distortions induced 
by the traditional IRT model With the rise in popularity of item response theory, IRT, 
many methods for estimating the latent distribution have been developed. 

Samejimaand Livingston (1979) fit polynomials to latent densities using the method of 
moments. Samejima (1984) also fits 0 densities, given the MLE <?, using specific parametric 
famine* by matching two or more moments. Levine (1984, 1985) projects the (unknown) 
latent distribution onto a convenient function space in the span of the test's conditional 
likelihood functions and estimates the projection by maximum likelihood. Mislevy (1984) 
assumes that the ability distribution is well approximated by a collection of masses centered 
at points placed a priori along the 9 axis and estimates the sizes of the masses at each 
point. More generally, hierarchical and/or empirical Bayes techniques may be used to esti- 
mate parameters of the latent trait distribution if it belongs to a tractable family of priors. 
These methods all rely upon local independence for their validity; moreover they tend to be 
expensive in terms of computation and storage. 

We will exanvne a simpler method of estimating the ability distribution which, in addi- 
tion, is robust to some violations of local independence. Consider a set of J binary items 

Xj ~ (Xi , A'2, . . . , Xj) 

that may be embedded in a longer sequence or pool of items (X u ^2,^3, . . .). Let 0 be the 
latent trait of interest, let P x {9). P 2 (0),.. . , Pj($) be the item characteristic curves, ICC's, 
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with respect to 8, and denote averages of items as X/ = j £f X$, and similarly for averages 
Pj{B) of ICC's. Under the usual local independence (LI) and monotonicity (M) conditions 
of item response theory (e.g. Hambleton, 1989), or more generally under Stout's (1990) 
formulation of essential independence (EI) and local asymptotic mscximination (LAD), we* 
know that Oj(Kj) s T] 1 Q[j) is a plausible point estimate of 8: h{Kj) is a consistent 
estimator of 8 under either set of assumptions. It immediately follows that the distribution 
of MX/) 

W) = PihiXj) < t) 

converges to that of 8 as well (e.g. Serfling, 1980, p. 19). Now consider administering the 
test Xj to A r examinees, obtaining N response vectors • . - <£jfj and corresponding 0 
estimates 0j(2Lu), • • . ,0/(Xvj); a natural estimator of the 8 distribution is the "empirical" 
distribution of these 6/s 

1 N 

= {fraction of hULJf* < *} 
where the "indicator f unction" Is takes the value. 1 if 5 is true and 0 if 5 is false. 

Theorem 1 Suppose (X U X 2 ,...) is a sequence of items and 8 ts a latent trait such that 
EI and LAD hold. Define $.j{Xj) as above. If the distribution function 

F(t) m P[0 < t) 

is continuous, the empirical distribution function F*tj{t) defined in (I), converges in proba- 
bility to F at each t as both J ~* oo and N — ♦ oo. 

As with the work of Stout (1990) and Junker (1991), the embedding in an infinite-length 
item pool is partly a conceptual tool. In practice, one might check the EI condition using 
Stout's (1987) test, and check the LAD condition by verifying that the average ICC for a 
particular test was an invertible function. 
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In fact, the full strength of the LAD condition is not needed here. A weaker condition 
that also gives the theorem is that, for aH it > *i there exists e(<t, X2) such that 

nmmiTj{t 2 )-Pj(U)>€(t u t 2 )- (2) 

Similarly, the full strength of tne EI condition is not needed. It suffices to have, for all t, 

lim Var(Xjje = x) = 0. ( 3 ) 

Under the weaker conditions (2) and (3), the consistency of 7j (Xj) as a point estimate 
for 0 may fail, but Theorem i still goes through The proof of Theorem 1 is * ased on a 
well-known exponential bound due to Dvoretsky, Kiefer and Wolfowitz (Serfling, 1980, p. 
59) on the error made in approximating Fj(t) with Fx,j{t). See Appendix B for some details. 

2 Two practical considerations 

Note that the theorem does not in any way require that the ICC's have 0 and 1 as lower and 
upper asymptotes. For example, if 7 s j has a lower asymptote c, i.e., 

limmfFj(t) > c > 0, Vr € E, 

there certainly could be positive probability that some Xj's have Xj < c. The only rea- 
sonable thing for jFJ* to do with such an Xj is send it to —00, which ruins the estimate of 

F. 

But for any fixed 0, if c < lim InO^ 

limsupFpG < c] = iimsup f° P[Xj < c|8 « t]dF(t) 



< limsup / PlXj < Pj(9)\e = t\dF{t) 
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after observing that P[Xj < Pj($)\B as t] -+ i{ t <gy and applying standard convergence 
results (Ash, 1972). By letting 9 — oo it follows that 

Urn P[Xj < c] = 0. 



The distribution of 0t{Xj ) does indeed place mass at — oo for some scores (e.g., for Xj/ J m 0 
and fails to "recover" the 0 distribution for those scores. The point of the calculation 
is that as J grows, the part of the 6 distribution corresponding to these "bad" scores 
becomes negligible, so we don't have to worry, theoretically, about its not being recovered. 
Indeed, under local independence, we can further calculate that P[X j < c] falls off essentially 
geometrically as J — ► oo (Hoeffding 1963, p. 15). 

However in practice we still must be concerned about X/s below a lower asymptote c, 
or above an upper asymptote d. In the pilot simulation described below we have made two 
adjustments for this problem. Our first adjustment replaces the basic point estimate Bj with 
an estimator based on a shrunken Xj: 

J-Xj + 1 
J + 2 

This estimator also converges in distribution to 0 , and it is evidently bounded (for fixed J) 
if the asymptotes of Pj are 0 and 1. Our second adjustment is in the numerical inversion 
of the function rj on the computer. We have written the inverter (a secant variation of 
Newton's method) so that it finds a root of a linear extrapolation of 7j(t) ~ Xj when ~Xj 
lies outside the asymptotes of P j. This adjustment is innocuous asymptotically. 

Finally, note that this method (like others) requires "perfect" knowledge of the ICC's. 
In practice of course one never knows the ICC's perfectly, so it is important to know what 
happens if the "wrong" ICC's are used in the definition of 9j. For example, how sensitive 
is this method to using estimates of the item parameters in a 3PL (three parameter logistic 
ICC) model, instead of the true parameters; or how far off is the estimated 0 distribution if 
the true ICC's axe 3PL's. but only Rasch ICC's are used to calculate Bj7 
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Theorem 2 Suppose X u X i% ... and 8 are as in Theorem I with ICCs P x (t), P 2 (t) f 
with average ~Pj(t) as before, and suppose 

/*,(<), Jfc(t), ... 

are another set of ICCs, urith average ~Rj(t). LetTJ 1 andW] 1 be the corresponding inverses, 
and let 

Fix 6 such that ~F?~Kj{9) has a finite limit r{9). Then 

FM = P[h{Kj) F(r(fl)) 

(where F is the distribution ofQ). If these hypotheses hold for every 0, and if r and F art 
continuous functions, then the convergence is uniform in 0. 

The existence of the limit r{&) is a technical requirement that, like LAD, is innocuous in 
the context of real, finite length tests. The most useful interpretation of Theorem 2 is that 

as J -+ oo, i.e., the distribution of 8 is estimated with a distortion TT/Hj. This follows 
from the theorem if F is continuous at r(0). 

The proof of Theorem 2 expands on the technique used to prove convergence of Fj(0) to 
F{9); see Appendix B. Just as in Theorem I it is also possible to show that the empirical 
distributions 

1 N 

converge to F(r(&)). 

The value of Theorem 2 is that if the function Pj\Rj($)} can be (partially) identified, 
then the distribution of 0j can still tell us a lot about the underlying 6 distribution. For 
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example, if the "true ICCV are Pj{9) and the 0 distribution is recovered with "estimated 
ICCV Rj{9), with the estimated ICC's satisfying 

as J -+ oo, then the estimated distributions Fj will converge to the true distribution F of 
0, as iong as the derivative T'jiB) is bounded away from zero at each Q as J — ► oo (this is 
guaranteed by LAD for example). 

Some knowledge of the underlying 6 distribution may even be available when the "true 
ICCV P 3 {9) and the "recovery ICCV R } ($) do not match up asymptotically. For exam- 
pie, it is easy to check numerically that for "typical" parameter values, averages of logistic 
[CC's are themselves approximately logistic (with parameters approximately the averages of 
the discrimination and difficulty parameters of the individual ICC's). Thus for example if 
the P;(&) are Rasch (one-parameter logistic) and the estimation method for the "difficulty 
parameters" bj is known, on average, to bias the bj by some fixed but unknown additive 
bias parameter 0 (so that logit R } ($) « logii P,{6) + /?) then roughly T] l (Rj{0)) *sa6-0, 
with a near 1, so that the location of the 0 distribution will be estimated wrongly but 
the (shape) family to which it belongs may still be identified. Similar considerations apply 
when the P 3 {9) are 3PL, and the R,{$) are 2PL: over the domain of 7] l (0), Pj l (Rj{0)) is 
approximately linear. 

3 Kernel smoothing 

The basic estimator proposed ; n (1) is the u empirical distribution* function 

1 Jt+ 

= £ fVpw =- j/ J] \-r;> u/Jm («) 
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where 



is tne natural estimator of th*> \ *. A „ _ 

X Th . ~ the (i5Crete ) 4-nbutkm of X, baaed on JV observation. X,, 
• • , X W . The nubcator funetion on the far right in (4) miy be written 



-here *(„) fa constant, except for a jump from 0 to 1 at « = 0 and A is anv ^ 
number. In cm. where the 6 dirtriWi *• • P °" t,VB 
theperformancoT^ by ^ £7 ^ ^ * * ™ 

estimator aa fr ° m ^ to «• ^te the smoothed 



J=0 
I * 



(5) 

This estimator is in the same soirit « ir_ i j 

acontin™ , "'"^^V^to" for mimating the density of 

a continuous random variable V haa*A ™ a- ■ , oi 

based on oW. mdependent observations V u V, v„: 

First, our estimator A,7* is a H^k.,.- 
d«) ■» mother exampiel v I ^ T^' * ' ^ ^ ^ 

Second, „e are not ll 2 T" " ^ " ^ 

lowed dlr #* access to the observation* ft /a <*r 
our estimation of F on the disarm • Dser ^ion S 9,, ...,e*. We must base 

ntnedlscret ^ Qo»y transformations I,, T VfflfA A 
Note that the "granularity" of th~* u * ' * J 01 6Ar ' 

8 - V ° f these operations changes with J. 
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■v "V ™, « h» tramformed by the nonlinear transfor- 
Third, the observations X u, • • • , Xm must «* "austormea j 

mation 7?. This means that the gnaubrity changes over the Ange of 6 and Xr, t 
compUcates practical calculation, such as those leaning to optimal rate, for JM «A *. 

We now show that the weighted root mean squ«e error (RMS) between this estanstor 
.nd the true 6 distribution goes to zero as iV, J -> oo. The theorem below is analogous to 
Theorem 1- 

Theor, S Ssppose «na 8 are o ,n 1W i 4* ICC, WW.--- 

D e/ine F,,(t) as » «. for */i«d ^ MA. >%— *• ^ 
Junction FofQi* continuous, and K ««. « «n*oi«*e moment, 

as AT -» oo, J -* oo and k — 0, for any density y(t). 

Unlike most nonparametric density estimation results, there is oo restriction on the rates 
* which i - 0, N - oo or J - oo. This » partly becau» a distribution function . 
smoother than, and therefore easier to Ornate than, a density. The corresponding techn,que 
for estimation of the 9 density would require h* to tend to zero more slowly than £(«,Ub) 
el for example, as well as further conditions on the rates at which N and J tend to oo. 
D^pite the fact that there are no rates fa the theorem, devising h as a function of N and J 
to produce the "right' amount of smoothing is an important issue to which we shall return 

below 

The proof of Theorem 3 (see Appendix B) is based on decomposing the RMS in (6) as 
RMS' = f{P^(Xj) + hY<t)-m^W^ 



1 f°° 

+ -~ / Var A' 

N J-OQ 
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where Y is a random variable with distribute v - a j 

T , . , . . attribution A , independent of 6 and all item responses. 

This technique can be modified to show that 

BlFsMt) - F(t)} 2 - o 
for any aad heacc W) F( 0 in probabiljty> for ^ 

to F. It would also be clear from the oroof th*t th~ 5 . 7 

t Jr d rr; ion of ms in (7) int ° sMi » - — * **■=, 

that t e p lmai , shoujd be _ y thao ^ ^ 

is relatively large, the coarse granularity inherent in r'/T^ l ,j , 

, . . . t 5 y m * erent m P / TO should predominate over the 

finer granularity mherent in observing N examinees, 

b v ^ WOr ^|* *PProacb to setting A is to make a quick, crude estimate of the variance of 6 

c and the upper asymptote , of 7^> and then applying the formula 

A = C J-*/5. (Var0)1/2 

which seems appropriate when A' has a variance (Silverman. 1986, pp. Reiss 1981) 

Our crude estimate of Var 0 h nht.i^ u , . ' } * 

var o !S obtained by tabulating values of W = pTV f / + n/f 

for ail J suc h that c < tf + , )/{J + 2) < ^ ^ ^ ^ ' «> + W + 2 » 

(Vare)^ w (.7413)(m £ er^i/e ran^e) 

(following the relationship between intern,,**;!- 

mterquartiie range and standard deviation for the Normal 
distribution). Preliminary trials with (7 = 1 2/9 m Ud • . 

« « « . 1/34/4 m (8) indicated that C = 1/3 

produced the best RMS results. 

There is reason ,.0 brlif»v,P> » » 1 ; • ^ 
(6) (S.lverman, 1986. pp. 42-43). The A' used in ou: -i, ,, !ioas was 

K{t) " £ ;<» - " 2 > 
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0 , t <-l 
= < \{Zt-t* + 2) , |t| <1 

1 , t >l 

v 

This choice is conservative about the tails of the 6 distribution. 



(9) 



4 Computer simulation 

The estimators proposed in Theorems 1 through 3 are less complicated than distribution 
estimators currently in use in IRT. To help evaluate these estimators a pilot simulation 
study was performed. In this simulation, item response data was generated using various 
<f £, — I parametric models, and we attempted to recover the ability distribution usmg both 
the smoothed and unsmoothed estimators. 



Monte Carlo trials: 


M = 100 


Examinee sample size: 


AT =5,CXM) 


Ability distribution: 


Normal #(0,1? 

Bimodal Mixture itf(-L5,l) + §iV(1.5, 1) 

Discontinuous X? "* 1 


Test length: 


J = 10, 30, 60, 100 


ICC type: 


Rasch: 6/s equally spaced from -2 to 2 
3PL: 6/s equally spaced from -2 to 2 

a/s cycling through 0.5, 1.0, 1.5 

c/s ail set to 0.2 
'Estimated': Generated with the 3PL ICC's above; 

Estimated with the ICC parameters: 
1/J) 

a } ~ iV(a>, 0.25) 
7;~raax{JV(0.2,0.1),0} 
(all independent). 



Table 1: Monte Carlo simulation parameters. 
The parameters of the pilot simulation are indicated in Table 1. All possible combinations 
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of these parameters were investigated. The choice of ability distributions was intended to 
examine two "typical" and one "worst case" target distribution. While the standard normal 
distribution is extremely smooth and has a bounded positive density the distribution of the 
shifted chi-squared random variable *} - 1 puts no mass below 6 ~ ~l and the density jumps 
from 0 to -foo at 0 = —I. (This choice is not intended to be terribly realistic, but allows 
us to explore the performance of our distribution estimator under adverse circumstances.) 
Although the means of these distributions are both 0, the chi-squared distribution has twice 
the variance of the normal. The bi modal mixture was chosen to r e pr e s ent a situation where 
two radically different types of examinee take the test. Its standard deviation is also greater 
than 1 (roughly 1.8). 

The ICC's used were all subfamilies of the three parameter logistic (3PL) curves: 

Pj(t) "» Cj + (1 - c,-)[l + expf-a^t - bj]]" 1 . 

In the case labelled "Rasch*, a; s l,c* s 0 and 6 i are as indicated. The same NX's 
were used to recover F as to generate the data. Indeed ffi is exactly the MLE for 9 
under the Rasch model with known item parameters. Similarly for the 3PL case, whe** all 
the parameters were allowed to vary as indicated above; now ffp is a somewhat inefficient 
estimator of 9. In the case labelled 'Estimated', the 3PL ICCTs were used to generate the 
data {P } {0ys in Theorem 2} but then their item parameters were deliberately contaminated 
with noise to produce the -recovery ICCV (i2,(0)'s in Theorem 2) used to estimate F, to 
roughly approximate the practical situation in which item parameters themselves must be 
estimated from data. Thus the cases Rasch, 3PL, and 'Estimated' represent increasingly 
hostile situations for the distribution estimator to work in. 

Finally, the choice of N = 5.000 examinees was somewhat arbitrary. In preliminary runs, 
N s 1,000 and N = 10,000 yielded measures of fit of the estimated ability distribution to the 
true distributk n quite comparable to those reported here. The main difference was in the 
variances of our estimated measures of fit. N = 5,000 was chosen because at that level the 

16 ! 
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variance is much better than at N = 1,000 and not much worse than that at N = 10,000. 

The basic estimators used to compare recovery of F from case to case were the empirical 
distribution function (EDF) 

1 N 



and the kernei distribution estimator (KDE) 



where 



J + 2 



(and K and h are as described in (8) and (9) above). Each of these distribution estimators 
is consistent for the true 6 distribution, by application of Theorem 1 through Theorem 3. 
For each simulated data set, sample means and standard deviations for estimates of 

RMS « {E JjtFUt) - F(r)ftKf )<fc} l/ * 
are reported. In addition, mean estimates of 

MAX = £{sup{(F e „(0 - F[t)\ : -co < i < oo}] 



est 



and the average value LOC = tmax at which MAX is attained are reported. (Note: F ( 
stands for either of the distribution estimators above.) In general the weighting function g 
should be chosen to reflect our interests in the 0 distribution F: g should give more weight 
to areas of F that should be well-estimated and less weight to areas of F for which we are 
willing to tolerate less good estimation. In these simulations, the weighting function g was 
taken to be the standard normal density: some weight is given to estimating F well at all 
#'s, but more weight is given to estimating F well near 0 — 0. More details about these 
distances and the methods of calculation can be found in Appendix A below. 
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Test 
Length 


Estimator 


RMS 
ave SD 


Deviation 
MAX LOC 


10 


EDF 
KDE 


0.04655 0.00002 
0.02318 0.00O03 


0.11021 0.37694 
0.03812 0.89134 


30 | EDF 
| KDE 


0.01692 0.00001 
0.00887 0.00002 


0.04032 0.09754 
0.01447 0.23184 


60 


EDF 
KDE 


0.00984 0.00002 
0.00652 0.00002 


0.02510 0.07844 
0.01076 0.05334 


100 


EDF 
KDE 


0.00731 0.00002 
0.00577 O.OG002 


0.01895 -0.02856 
0.00965 -0.07616 



Table 2: G ~ AF(0, 1), Rasch 



Test 
Length 


Estimator 


RMS 
ave SD 


Deviation 
MAX LOC 1 


10 


EDF 
KDE 


0.07015 0.00002 
0.05158 0.00003 


0.15724 -1.00076 
0.09368 -1.23646 


30 


EDF 
KDE 


0.02794 0.00002 
0.02176 0.00002 


0.06418 -0.77476 
0.03755 -1.26626 


60 


EDF 
KDE 


0.01521 0.00002 
0.01251 0.00002 


0.03527 -0.46316 
0.02109 -1.05756 


100 


EDF 
KDE 


0.01035 0.00002 
0.00907 0.00003 


0.02463 -0.33196 
0.01532 -0.80926 



Table 3: 6~ A'(O.l), 3PL 
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/ 



Test 




RMS 
ave SD 


Deviation 
MAA LUC 


10 


EDF 
KDE 


0.09665 0. 
0.08412 0.1 


m 

mo 


4 
4 


0.22175 -0.74996 
0.13431 -1.21956 


30 


EDF 
KDE 


0.05695 0. 
0.05439 0. 




14 
14 


0,11573 -0.67436 
0.08258 -0.89616 


60 


EDF 
KDE 


0.01835 0.1 
0.01645 0.1 


oooc 


(2 
i3 


0.04188 -0.70396 
0.02802 -1.10236 


100 


EDF 
KDE 


0.01823 0.1 
0.01767 0.1 


S 


4 


0.03782 -0.49826 
0.02668 -0.79636 



Table 4: 0 ~ A*'(0. 1), Estimated 

From Tables 2, 3 and 4, it is clear that smoothing in the KDE is helping, especially with 
short tests. In comparing Tables 2 and 3 it is clear that the presence of the nonzero lower 
asymptote c is degrading the fits. This can be seen both in the reduced RMS values and In 
the movement of LOC, the location of the maximum deviation between F eBt and F, toward 
negative values. Finally, comparison of Tables 3 and 4 indicates th'^t using 'noisy' ICC's 
somewhat degrades the recovery of F. 

Figure 1 illustrates the performance of the estimators in Table Z. The first three panels 
are probability-probability {p-p) plots of the estimated 8 distributor I » ^ticai axis) against 
the true 8 distribution (horizontal axis), for 10, 30 and 60 items. Each panel depicts one 
of the 100 Monte Carlo trials for the corresponding line of Tabfc 3 Th* *te-? functions 
represent the EDF estimator and the smooth curve represents the KDE esximator. The 
closer each is to the solid diagonal line, the better the true probability the 6 dilution 
are estimated. In particular for 30 or 60 items, estimated probabilities are t-.v.te close to ' iv*: 
probabilities. The story is very similar for the performance of the estimators it. Tables 2, 5 
and 6 (see also Figure 3). The fourth panel in Figure 1 compares the density derived from 
the KDE estimator in panel three to with the true 8 density (some excessive Sumpine- ;„ 
the estimated density is attributable to the fact that the "window width" k v. as chosen to 
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Thsts - Norma*. 3PL, 10 Items Theta~ NomiaL3PL, 30 Items 




oo ai 0.4 oa o» to o.o ojt &« oo o« 

HWIMa OMXBU9M t>M»tlW>flM> M» 



TTwtei - Normal, 3PL, 50 items Tbsta ~ Normaf, 3PL, 60 Items 




Figure 1: p — p and density plots of EDF and KDE estimators. EDF is represented by step 
function, KDE by curve. In the last panel, the true density is the dashed curve and the 
KDE-based density estimate is the solid curve. 



20 



Junk or: Recovering the Ability Distribution 17 
make a good distribution estimate rather than to make a good density estimate). 



Thsto - Nonroi, Eatimatad, 30 Items Tbete - Nomwd, Estfmatwd, 30 Items 




Figure 2; p- p and density plots of EDF and KDE estimators. EDF is represented by step 
function, KDE by curve. In the second panel, the true density is the dashed curve and the 
KDE-based density estimate is the solid curve. 

Figure 2 illustrates the performance of the estimators in Table 4. The left panel is a 
P-P P lot of the EDF (step function) and KDE (smooth curve) estimators for 30 items, and 
the right panel compares the corresponding KDE-based density with the true 6 density. In 
the Monte Carlo trial illustrated, contamination in the parameters of the "recovery" ICC's 
caused some bias and scale distortion in the estimated distribution, but the estimate still 
correctly suggests that 0 has a Normal or bell-shaped distribution. 

In Tables 5, 6 and 7, in which 8 is bimodal, the KDE estimator is still doing better 
than the EDF. It is encouraging to see that the orders of magnitudes of the RMS and MAX 
measures of fit are the same as in the iV(0, 1) case above. It is a little surprising that the 
fits can actually be better for the bimodal cases than the normal, but perhaps the greater 
variability is working in our favor here: we are getting more, extreme-ability examinees with 
which to form F e9t and thus to estimate the tails of F. Finally, note that there is much less 
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difference in the fits of the 3PL and 'Estimated' 3PL cases. 
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Test 
Length 


Estimator 


RMS 
ave SD 


Deviation 
MAX LOC 


10 


EDF 
KDE 


0.04769 0.00003 
0.03678 0.00003 


0.12379 -1.36996 
0.06299 -1.25226 


30 


KDE 


0.01820 0.00003 
0.01547 0.00003 


0.04668 -0.61856 
3.02502 -0.42646 


60 


EDF 
KDE 


0.01107 0.00003 
0.00995 0.00003 


0.02710 -0.25206 
0.01622 -0.17576 


100 


EDF 
KDE 


0.00870 0.00003 
0.00817 0.00003 


0.01923 -0.03886 
0.01290 -0.13216 



Table 5: 0 ~ Bimodal, Rasch 



Test 
Length 


Estimator 


RMS 
ave SD 


Deviation 
MAX LOC 


10 


EDF 
KDE 


0.05268 0.00003 
0.03612 0.00003 


0.12160 1.08084 
0.09342 -4.44996 


30 


EDF 
KDE 


0.02268 0.00002 
0.01877 0.00002 


0.05616 -0.66696 
0.04229 -3.68386 


60 


EDF 
KDE 


0.01353 0.00003 
0.01205 0.00003 


0.03496 -1.24996 
0.02561 -2.75386 


100 


EDF 
KDE 


0.00998 0.00003 
0.00924 0.00003 


0.02457 -1.22086 
1 0.01860 -2.64946 



Table 6: 6 Bimodai, 3PL 



Figure 3 illustrates the performance of the estimators in Table 6, for 60 items. Again, 
the left panel is a p~p plot of the EDF (step function) and KDE (smooth curve) estimators 
and the right panel depicts the KDE-based density estimate. Once again the estimated 
distribution provides good estimates of probabilities under the true distribution, and the 
corresponding density estimate tracks the two modes of the true 0 distribution reasonably 
well. 
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Test 
Length 


Estimator 


RMS 
aw SD 


Deviation 
MAX LOG 


10 


EDF 
KDE 


0.06S87 0.1 
0.05101 0.1 


m 


5 
5 


0.14624 0.78714 
0.09497 -4.97589 


30 


EDF 
KDE 


0.03203 0.1 
0.02958 0.1 


m 

22 


i5 
$ 


0.08038 -2.37405 
0.06457 -3.38695 


60 


EDF 
KDE 


0.01386 0. 
0.01245 0. 


mm 


13 
(3 


0.03747 -1.11546 
0.02796 -2.63776 


100 


EDF 
KDE 


0.01120 0. 
0.01055 0. 


oooc 


14 
14 


0.02776 -1.42786 
0.02134 -2.29616 



Table 7: 8 ~ Bsmodai, Estimated 



ThsSa - Hmodai. 3PL 60 Items Thais - Bimodsl, 3PI, 80 tiema 




Figure 3; p - p and density plots of EDF and KDE estimators. EDF is represented by step 
function, KDE by curve. In the second panel, the true density is the dashed curve and the 
KDE-based density estimate is the solid curve. 
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In Tables 8, 9 and 10, note how gradual the decrease in MAX is; this can be attributed 
partly to the fact that &? "doesn't know" that F assigns no mass to the interval (-oo, -1) 
and thus freely places §'* there, so that F cst is grossly overestimating F for 9 < -1. This 
certainly explains why LOC is near -1 in all but one case. It seems remarkable that the 
RMS should drop as much as it does, considering the fact that the Normal weighting function 
g ^igns significant weight to the region near or below 6 = -1. Once again there is little 
difference between the 3PL and 1 Estimated' 3PL cases. Finally, note that the EDF estimator 
is doing better than the KDE estimator in many cases here. Our ad hoc choice of h is 
probably failing us here by being too large to track the "sharp upturn*' in F at -1. 



Test 
Length 


Estimator 


RMS 
ave SD 


Deviation 
MAX LOC 


10 


EDF 
KDE 


0.09922 0.00004 
0.09241 0.00003 


0.23352 -0.26996 
0.20600 -1.00996 


30 


EDF 
KDE 


0.05404 0.00003 
0.05508 0.00003 


0.14608 -0.91796 
0.17924 -1.00996 


60 


EDF 
KDE 


0.03812 0.00003 
0.04010 0.00003 


0.15993 -1.00996 
0.16010 -1.00316 


100 


EDF 
KDE 


0.02944 0.00003 
0.03215 0.00003 


0.15246 -0,99996 
0.14717 -0.99996 



Table 8: 8 ~ \ 2 - 1, Rasch 



5 Discussion 

To implement this scheme in practice, one must numerically invert the average ICC Pj for 
the test in question at or near the J+l possible values of Xj. After this, a table constructed 
from the inversion can be used simply and cheaply to estimate 8 distributions for each 
of several administrations of the same test, or each of several subpopulations in a single 
administration. For shorter tests lengths the basic statistic h may need to be reseated, 
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Test 
Length 


Estimator 


RMS 
ave SD 


Deviation 
MAX LOC 


10 


EDF 
KDE 


0.11871 0.1 
0.10699 0.1 


m, 
33 


4 
4 


0.30689 -1.00996 
0.28934 -1.00996 


30 


EDF 
KDE 


0.07276 0. 
0.07188 0. 


*** 


4 
14 


0.22700 -1.00996 
0.23167 -i.00996 


60 


EDF 
KDE 


0.05291 0. 
0.05408 0. 


m 


13 
13 


0.20477 -1.00996 
0.20211 -1.00996 


100 


EDF 
KDE 


0.04153 0. 
0.04365 0. 


oooc 


13 
13 


0.19628 -0.99996 
0.18294 -1.00976 



Table 9: 0~* 2 -l,3PL 



Teat 
Length 


Estimator 


RMS 
ave SD 


Deviation 
MAX LOC 


10 


EDF 
KDE 


0.11387 0.00005 
0.10600 0. 00005 


0.30689 -1.00996 
0.33073 -1.00996 


30 


EDF 
KDE 


0.08264 0.00005 
0.08161 0.00005 


0.32359 -1.00996 
0.30244 -1.00996 


60 


EDF 
KDE 


0.05322 0.00003 
0.05466 0.00004 


0.20477 -1.00996 
0.21590 -1.00996 


100 


EDF 
KDE 


0.04303 0.00004 
0.04491 0.00004 


0.20150 -1.00996 
0.20859 -1.00646 



Table 10: 6 ~ x 2 - 1, Estimated 
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as we have done with 0 [ J\ to effectively estimate F. Kernel smoothing of the estimated 
distribution (KDE) is also quite helpful. Work is currently underway (Nandafcumar and 
Junker, 1992) to farther examine and refine these methods using essentially unidimensioxial 
simulation data, and to appiy the estimators to real tests. 

Because it is fast, this scheme could be also be used for some diagnostic purposes. For 
example, if ICCs were estimated under the assumption of a Normal underlying 6 distribution 
and a 3PL model, the KDE estimate of the 0 distribution could be plotted on a Normal 
probability plot to examine (jointly) the assumptions about distribution and ICC forms. Or 
the 0 distribution estimates under two ICC estimation techniques could be compared to see 
how well they agree: Quite different ICC forms or parameter sets could in principle lead 
to very similar 0 distributions; if so then for many purposes it would then be a matter of 
indifference which ICCs were used, so considerations such as cost of ICC estimation, etc., 
could come into play. Finally, it may be possible to estimate the 0 distribution sufficiently 
accurately with, say, Rasch ICCs (for which item parameters can be estimated independently 
of the & distribution), and then use that estimate as part of a marginal maximum likelihood 
approach to estimating item parameters in a 3PL model which more accurately models the 
item response behavior. 
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Appendix A Details of the simulation 

For each simulated data set, M Monte Carlo trials were run (one trial entails sampling N examinees, 
generating a 0 and J item responses for each examinee, and constructing the distribution estimates 
F N J and Fnjh from these). In our simulation, M was taken to be 100. In the discussion below, 
F e rt stands for eith er of the two distribution estimates tried. 

For each tr*JL two measures of fit to the true ability distribution F were reported. First, the 
value of 

5 s max{|F e ^i) -F(fi)| :i 0 ,...,ti2oo} 
was calculated, for U's ranging from -6 to 6 spaced at 0.01 intervals, as an approximation to 

S = sup{jF« t (0 - F(t)\;t € (-00,00)} 

as well as the value L - t, m „ at which S was attained. Second, an approximation to the squared 
distance 

/ 2 = r [rut) - Ht)?g(t)dt 

was calculated, where the weight function g was taken to be the standard normal density. The 
approximation used was the Monte Carlo approximation 

where T\ , . . . Tk are iid with marginal density g, and K - 500 for our simulation. 
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Finally, Monte Carlo sample averages 

M m=l M Z£l M ££l 

were computed, aa well as sample standard deviations. ~S estimates E$), X estimates £(Lj, and 
L estimates {Eft 2 }} 1 / 2 standard deviation for 7 was estimated using the delta method (Serfimg, 
1980, p. 118). 

£(5) may be regarded as a reasonable approximation to MAX = E[S]. Because of the dis- 
cretization in c a l cul a tin g S and L, £[Z) probably is not as good an indication of the true value 
LOC = t where the distributions are farthest apart, bnt it may still be of some descriptive value. 
Finally, {Ef?}} 1 ' 2 is exactly 

The pseudo-random number generators used were linear congruenttal generators (see Rubin- 
stein, 1981) 

r v =s (o • r„_i -f c) mod m, 

using a = 7 5 ,c = 0, m = 2 31 for generating 0's and a - 2 7 -f 1, c = 1, m = 2 s5 for generating 
item responses. Normal observations were obtained from these unifo: m observations by the polar 

transformation 

Z\ = log Ut cos 2xU 2 

Z* = v-2iog{/ 1 sin2^(/ 2 
and the bimodal mixture and \ 2 - 1 observations were taken to be appropriate transformations of 
these. Pseudo-random values obtained using these transformations do exhibit some lattice structure 
but this was not considered a problem for our calculations, which are essentially all Monte Carlo 
integrations. 
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Appendix B Proofs 

Proof of Theorem. 1: Observe that, for any e> 0, 

p [\fna*) - ne>i > e] < p [ifsA*) - + i*M e ) - > <] 

< F[{F^{e)~Fj(e)|>e/2] (for large J) 

< C-e-W'W - 

for some universal constant C, and iV large. (Serfimg, 1980, p. 59). This tends to zero as N — oo. 
Proof of Theorem 2: Observe that 

« pfFj^Xj)^?; 1 ^)] 

= P{FJ ! (Xj) + r(0) - TJ^Ae) < r(0)]. 

By Slutsky's Theorem, since t{$) = limj_^o FJ 1 ]^) we ^ow that 7J i (Xj)+r(^) and ^(Xj) 
have the same asymptotic law, i.e. for any f. 

PPFftXj) + r(0) - <*]- F(t). 

Then in particnlar for £ - r(B), 

P\T} l (Xj) + r{9) - ?j{0)Hj($) < r($)) - F(r(0)). 

The assertion about uniform convergence follows from a theorem of Polya (Ser fling, 1980, p.18). O 
Proof of Theorem 3: In the following calculation, it will be helpful to let Y be a random variable 
with distribution K independent of 0 and all item responses. Squaring (6), 

RMS 2 = E f°° [Fsjh(t) — F(t)] 2 g(t)dt 

2 



- IZ E {^ 0 iVt7j " j/J]K ' ~ Pj h ' ^ -^ e ^ f] ] 



g(t)dt 
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a- r {[bias{ f )] 2 -t- [varumet(t)]) g{t)dt 



f J 



-F[e<*]| 



/oo 
Var 



PjiXj) + hY<t}~ P[B < t}} 2 g(t)dt 



x L 



VarJf 



9(t)dt 



Note that (bia8)jf Jh does not depend on N. As long as 

we wiU have hY - 0 in probability, so that by Siutsky's Theorem the distributions oiT?(Xj)+hY 
and Tj l (Xj) will converge to the same thing, namely F{t) = P[B < rj, at every i (we are assuming 
F is continuous) as J -» oo and h — oo and A —> 0. Hence the integrand of (6taa)^ A converges to 
zero at each f, and if g(t) is a density it follows that (bias)% Jh — OasJ-*ooand/i-*0 (and N 
is free). 

On the other hand, for each fixed J, h, t the random variable 

K 

is bounded between 0 and I. hence if g(t) is a density we have for each fixed J and h 



f Var A" 



t-7-j l (Xj) 
h 



g{t)dt < L 



Multiplying by 1/A" it is ciear that ( variance) NJh ~* 0 as N — oo uniformly in J and h. This 
proves Theorem 3. O 
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